Skip to content

Update references to go-enry in documentation #7198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

lildude
Copy link
Member

@lildude lildude commented Jan 10, 2025

Description

GitHub's search no longer users go-enry and instead uses an internally developed library for language detection. This still feeds off Linguist so the same delays and limitations apply.

This PR updates the docs to reflect we no longer use go-enry.

Checklist:

N/A

@lildude lildude requested a review from a team as a code owner January 10, 2025 09:54
@lildude lildude merged commit 09880c7 into main Jan 10, 2025
8 checks passed
@lildude lildude deleted the lildude/update-docs-no-go-enry branch January 10, 2025 10:01
@DecimalTurn
Copy link
Contributor

Can we know what language is used for the internal library? It would help make sure that the regex syntax we use in Linguist is compatible.

@lildude
Copy link
Member Author

lildude commented Jan 13, 2025

It written in Rust.

@DecimalTurn
Copy link
Contributor

DecimalTurn commented Feb 16, 2025

Is it the project mentioned here?

I'm asking because I would like to confirm what engine is used for regex patterns. For instance, if they use this implementation of regex, they would have no support for possessive qualifiers (and most non-Re2 regex patterns) as discussed here. Which means we should probably try to avoid or remove them from the heuristics in Linguist.

@lildude
Copy link
Member Author

lildude commented Feb 17, 2025

Is it the project mentioned here?

Yup.

I'm asking because I would like to confirm what engine is used for regex patterns. For instance, if they use this implementation of regex, they would have no support for possessive qualifiers (and most non-Re2 regex patterns) as discussed here. Which means we should probably try to avoid or remove them from the heuristics in Linguist.

That's the implementation that is used and you raise a good point. Thanks and thanks for #7238.

If you've got an urge to fix more regexes, we have several regexes that need fixing as they don't run linearly or are vulnerable to ReDoS. I've started main...lildude/linear-regex-redos to add a test and clean them up as and when I have the time, so feel free to continue cleaning up our regexes.

@DecimalTurn
Copy link
Contributor

@lildude Thanks for confirming and yes, I do have the intention to work on some more regex adjustments. I actually have some more changes that I was going to submit PRs for, so I'll get on with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants